Data Collection

Understanding the data

Show random example

image with her 5 captions

Transforms the images to shape 224X224 and normalize to ImageNet values

Network Architecture

Model parameters

Hyperparameters --- value

Learning rate --------- 1e-3

Batch size -------------- 32

Epochs ------------------- 150

Dropout rate ----------- 50%

Embedding size ------ 512

LSTM hidden size --- 512

LSTM num of lyayrs - 2

# loss
criterion = nn.CrossEntropyLoss(ignore_index=train_dataset.vocab.stoi["<PAD>"]) 

# optimizer
optimizer =  optim.Adam(model.parameters(), lr=learning_rate)

Train

Inference section

Test result

Unsuccessful prediction

Partly success

Accurate description of the image